NYSED A-Series WWW Server Version 2.0P CGI Interface COMS GATEWAY INTERFACE (CGI) OVERVIEW May 25, 1999 This document describes the overall design and function of the CGI Interface for web administrators. Note that there is a CGI Library, CGILIB, available to dramatically simplify the necessary code within the CGI applications. The CGILIB and its usage is described in a seperate document and will be of interest to both administrators and application programmers. CGI DIAGRAM: ------------------------------------------- | | | C O M S | | ......................>. | ---------:----------------------:---------- (COMS SEND) : : : --------------- (Envronment | WWW Server | Information) --------------- : | V -------------- --------------- ------------------- | Web | | Request | CGI |CGI| COMS | | Client |<=TCP/IP=>| Handler |<=PORT==>|LIB| CGI | | (Browser) | | | FILE | | Application | -------------- --------------- ------------------- OVERVIEW: The CGI interface works as follows: 1. The web server request handler receives a request and determines that the request matches the COMS ScriptAlias from the WWWSERV/CONFIG file. 2. The request handler reads the WWWSERV/COMS/CONFIG file to find a match for the "application" node of the URL. If a match is found, the request handler gets the CGI app's agenda name, window name, timeout in seconds, and whether the request contents are to be passed as text (EBCDIC) or binary to the CGI app. 3. The request handler then creates a COMS message to be sent to the CGI app. The COMS message contains "Environment Variables" for the request in the same manner as environment variables for a UNIX based CGI app. One variable, "Port", is the CGI port file name to be used by the CGI app for reading/writing from/to the web client via the request handler's CGI port. The CGI port name is unique to each request so that any CGI Application that has gotten out of sync cannot cause trouble for subsequent requests. Two variables, HEADER_LENGTH and CONTENT_LENGTH, indicate the number of bytes in the request header and request contents so that the CGI app can correctly read them in from the CGI port file. The HTTP/1.1 specification allows request contents to be 'chunked'. The CGILIB will detect chunked input and handle it. (More on chunking below.) 4. The request handler causes a CGI_SEND event to notify the Server to do a COMS SEND of the environment message on behalf of the request handler. 5. COMS passes the message to the input queue of the CGI app and fires up the application if necessary. 6. The CGI app does a RECEIVE to get the environment message. The format of the environment message is listed below. 7.* The CGI app sets the CGI port file name to the value received in the COMS message, opens the CGI port file, and then reads in the request header and, if it is a POST request, it reads the contents as well. (Note that if the CGI app does not open the CGI port file within the timeout configured in the WWWSERV/COMS/CONFIG file the request handler will return a "501 Service Unavailable" result to the Client.) 8.* The request contents (POST method) or the QUERY_STRING (GET method) may be urlencoded and it is up to the CGI application to do the decoding. Note the the CGILIB provides the routine to accomplish this. 9.* The CGI app now has all information pertaining to the request: (1) the environment variables from the COMS message, (2) the request header from the client, and (3) the contents from the client. At this point the CGI app must sift through all the information and extract those parts that directly pertain to the current transaction. 10. The CGI app then processes the transaction whether it be an inquiry or update transaction. 11.* The CGI app then writes one or more response header fields to the CGI port to indicate to the request handler what type of response is being returned. The response header fields are included in the full response header that is returned to the client. Next the CGI app writes a "blank line" consisting of a carriage return and line feed (standard HTTP) to terminate the response header. Now, if there are contents to be returned, the CGI app writes the contents to the CGI port file. The CGI app is responsible for generating correct HTML code if the response is an HTML document. 12. While the CGI app is writing the response to the CGI port file, the request handler creates a full response header, sends it back to the client and writes the contents, if any, to the client as they are received from the CGI app. 13. The CGI app then closes the port file to indicate to the request handler the end of the response. 11. The request handler closes the client connection when all contents have been sent. An * above indicates a step where the CGILIB should be used to simplify the CGI app. CGI Features: The CGI app has access to the complete request header and is not restricted by COMS message sizes. This means the Server does not need to be modified for new header fields that may be defined in the future. The COMS/CONFIG file can specify whether the CGI app wants the request and response contents translated between ASCII and EBCDIC. The Server guarantees that a response is sent to the client even if the CGI app is not able to respond. CGI COMS ENVIRONMENT MESSAGE LAYOUT: CHARS 1-5 LIT "CGI: " CHARS 6-9 CGI version (ie: "0.10") CHAR 10 CHARS 11-16 LIT "PORT: " CHARS 17-24 PORT FILE NAME (9999W999) REQ HAND MIX#, "W", SEQ# CHARS 25 CHARS 26-36 LIT "PATH_INFO: " CHARS 37-? URL FOLLOWING CGI SCRIPT NAME (GOOD TO USE AS A TRANCODE) CHAR ? CHARS ?-? OTHER CGI ENVIRONMENT VARS. SEPERATED BY S (ie: "SERVER_NAME: www.nysed.gov") CGI ENVIRONMENT VARIABLE FIELDS THAT MAY BE INCLUDED: ENV. VARIABLES SAMPLE VALUES COMMENTS -------------- ------------- -------- SERVER_SOFTWARE NYSED-A-Series/2.0E BETA SERVER_NAME www.nysed.gov AS CONFIGURED SERVER_PORT 80 AS CONFIGURED SERVER_PROTOCOL HTTP/1.1 FROM SERVER REQUEST_METHOD GET (OR POST) SCRIPT_NAME /coms/testapp FROM REQUEST QUERY_STRING name=Joe+Smith MAX 255 CHARS PATH_INFO /MYTRANCODE FROM REQUEST REMOTE_HOST ppp1.nysed.gov BLANK IF NOT AVAILABLE REMOTE_ADDR 140.34.166.22 FROM REQUEST REMOTE_PROTOCOL HTTP/1.0 FROM REQUEST HEADER_LENGTH 823 FROM REQUEST CONTENT_TYPE application/x-www-form-urlencoded CONTENT_LENGTH * 54 FROM REQUEST TRANSFER_ENCODING * chunked FROM REQUEST ORIG_HOST ASERIESNAME. FROM A-SERIES * CONTENT_LENGTH or TRANSFER_ENCODING will be present, but NOT both. When TRANSFER_ENCODING is present, the contents are chunked which is a new feature of HTTP/1.1 and means that the client did not provide a content length in the request header. Only after the CGILIB reads the chunked contents can it determine the actual size. (Or determine that the content length exceeds the size of the available array space.) RESPONSE MESSAGES FROM THE CGI APPLICATION VIA THE CGI PORT: The CGI app must send a short header which may be followed by contents such as an HTML document. RESPONSE HEADER FIELDS: These are sent from the CGI app to the request handler preceeding any contents such as an HTML document. The response header fields recognized by the NYSED web server are: Content-Type: REQUIRED if there are contents Content-Length: OPTIONAL if there are contents Location: Results in a "302 Move Temp" response Status: **Now supported** The examples that follow, show how this is accomplished: 1. Content-Type: text/htmlHTML Document follows... 2. Location: http://www.nysed.gov/some/other/doc.html 3. Status: 205 Reset Content A pair of s terminates the response header as per the HTTP specs. If the CGI app does not send a response header to the server, the server will guarantee that an error response is sent to the client. HOW THE NYSED WEB SERVER HANDLES RESPONSE HEADER FIELDS: The response header fields provided by the CGI applications determine how the web server handles the response. The web server checks for the header fields in the following order: 1. If the 'Status' field is present then that is used to indicate the result to the client. The web server allows contents to follow, but it is up to the client application to know for which status codes that contents are (and are not) apropriate. Also there must be a 'Content-Type' header field included if there are contents. 2. Otherwise, if the 'Location' field is present then a '302 Move Temp' result is returned. The web server allows contents to follow in which case there must be a 'Content-Type' header field included. 3. Otherwise, if the 'Content-Type' field is present then a '200 OK' response is sent to the client with the contents that are expected to follow. An additional 'Content-Length' header field is beneficial but optional. 4. If none of the above fields are received then the web server returns a '501 Service Unavailable' response after the timeout configured in the WWWSERV/COMS/CONFIG file. CGI TIMEOUTS: The overall transaction timeout is determined by the Timeout setting configured in the WWWSERV/CONFIG file. The Application Timeout configured in COMS/CONFIG file is used as an 'inactive' timeout for the CGI app. CHUNKING: Chunking is a new feature of HTTP/1.1. Its main purpose is to allow the sender of contents (client or server) to notify the recipient when the transfer is complete for the situations where the sender does not know ahead of time what the content length will be. This is the case for most CGI requests. Briefly, chunking uses a format where the chunk length of each chunk (block) preceeds the chunk data, and the last chunk of data is followed by a zero length chunk to indicate completion. Prior to chunking the sender's only option to indicate completion was to close the connection. This prevented the possibility of keeping the connection alive for additional requests. **NOTE!** Existing CGI applications using the CGI_FORM and WRITE_PORT services of the CGILIB do *not* require any changes to support chunking. The NYSED A-Series CGI interface handles REQUEST chunking as follows: 1. The web server determines that the request contents on a POST request are chunked by the presence of a request header field received from the client: Transfer-Encoding: chunked 2. The web server indictes this to the CGI application by including the following environment field in the COMS message: TRANSFER_ENCODING: chunked 3. When the CGI application opens the CGIPORT file, the web server writes the request header to the port file. 4. The web server then writes the chunked contents to the CGIPORT file including the chunk headers and the final zero length chunk. The chunks are not modified except for conversion to EBCDIC as follows: The chunk headers, which include the chunk sizes are converted from ASCII to EBCDIC. The chunk data is converted from ASCII to EBCDIC if configured in the WWWSERV/COMS/CONFIG file, otherwise the original data is passed as 'binary' data. The end of each chunk (a CR and LF) is converted to EBCDIC. 5. The CGI application is required to interpret the chunked input. (* Note that the CGILIB will do this when the CGI_FORM Service is used by the application.) Because of the nature of chunking, the application has no way to know the size of the contents until AFTER they have been read in. This makes it impossible to know ahead of time the array size needed to store the contents. The NYSED A-Series CGI interface handles RESPONSE chunking as follows: 1. The CGI application writes the response to the CGIPORT the same way that it did prior to chunking.** 2. The web server knows from the HTTP version of the client whether or not the client can support chunking. If the client can support chunking (HTTP/1.1), the web server chunks the output from the CGI application and sends the contents to the client. 3. If the output has been chunked, the web server will keep the connection open in anticipation of another request from the client. If the output has not been chunked, the web server will close the connection as was done previously with HTTP/1.0. ** If the CGI application includes a Content-Length header field in the response, then the web server does *not* chunk the contents and does keep the connection open if the client has indicated the capablity of sending multiple requests on the current connection.